If you have an Internet connection, eventually you are going to want to access USENET
and its newsgroups. USENET is one of the most dynamic (and often controversial) aspects of
the Internet. With access to the Internet, you can set up, access, and work with all kinds
of newsgroups, but most Linux users will be interested in using USENET specifically. This
chapter looks at the background of USENET and news services for UNIX in particular, as
well as how the Linux news programs handle the news.
USENET is one of the most misunderstood aspects of the Internet. At the same time, it
is one of the most popular and frequently used aspects of the Internet (with the possible
exception of e-mail). To many users, especially those who don't use Internet's mail
facilities, USENET is Internet, and vice-versa.
USENET was originally developed to facilitate discussion groups (called newsgroups in
USENET jargon). A newsgroup lets any user with access to the system participate in a
public dialogue with everyone else. By the end of 1995, USENET carried over 9,000
different newsgroups totaling well over 100M of information every day. USENET is supported
in millions of networks in hundreds of countries and reaches hundreds of millions of
users.
Despite what most people think, USENET is not a formal network or entity. Instead, it
is a number of networked machines that exchange electronic mail (articles) tagged with
predetermined subject headers for specific areas of interest (newsgroups). The articles
are handled as electronic mail messages by most network machines; articles are processed
as news information only by the applications called newsreaders that send and receive the
messages.
Any machine that can attach itself to the Internet either directly, through a gateway,
or through a forwarding service (such as an online service provider) can become part of
USENET. All that is required to use USENET is the software that downloads and uploads the
newsgroup mail and a reader package that lets users read and write articles.
The software that implements the passing of USENET messages over local area networks
from one machine to another is the Network News Transfer Protocol (NNTP). Using NNTP, your
Linux machine can interact with any other machines that handle the news. NNTP software is
an integral part of most Linux versions, so you don't need to purchase or look for
additional software. Indeed, many people establish Linux machines just to access Internet
services like USENET, e-mail, and the World Wide Web.
USENET was developed out of a UNIX release known as UNIX V7, which implemented UUCP
(UNIX to UNIX CoPy) for the first time. As UUCP became popular for communications between
machines, it was expanded with program extensions and supplementary programs. USENET began
at the University of North Carolina, where Steve Bellovin used shell scripts to write the
first version of news software. UNC and Duke used this software to pass messages and
commentary between the two universities. Interest in the news software spread when the UNC
system was described at a Usenix conference in 1980. Steve Daniel was the first to
implement the news software in the C programming language. This version eventually became
the first general release of the news software, which was called release A.
To cope with the increasing volume of messages as new news sites were added to the
expanding informal network, two University of California students, Mark Horton and Matt
Glickman, rewrote the software and added new functionality. After a further revision of
their release B, the news software was generally released in 1982 as version 2.1. From
there, the Center for Seismic Studies' Rick Adams took over maintenance of the software in
1984, at which point it was up to release 2.10.2. One of Rick's first additions was the
capability for moderated newsgroups, resulting in release 3.11 in 1986.
Since then, several contributors have added features to the software, the most
important of which was a complete rewrite of the software undertaken in 1987 by the
University of Toronto's Geoff Collyer and Henry Spencer. Their rewrite greatly increased
the speed with which message mail could be processed and was generally released under the
name C News (from Release C). Over the next few years, the basic news package went through
some minor revisions but has remained true to Collyer and Spencer's version. Important
changes were made to the way machines transferred news messages, and a daemon was added to
process incoming and outgoing postings.
All the versions of news software developed to this point had used UUCP as the
transport. To allow transfer of messages over a network, a protocol called the Network
News Transfer Protocol (NNTP) was developed in 1986. NNTP-based software began to be
refined, and a widely used version was implemented in software written by Brian Barber and
Phil Lapsley, called nntpd. An alternative NNTP system that is widely available is INN
(Internet News), which provides a complete news package (user interface and underlying
software).
Apart from the underlying mechanics for transferring messages for newsgroups,
developments also were continuing in the user interface area, where the newsreader exists.
Newsreader software lets you read articles in newsgroups as they arrive. The original
reader was called readnews, and it remains one of the most widely used newsreader
packages, primarily because is it easy to use and is available on practically every UNIX
system.
Several alternate newsreaders were developed, expanding on the features offered by
readnews. Software such as rn (a more flexible version of readnews), trn (threaded
readnews), and vnews (visual newsreader) are freely distributed now. All are
character-based systems originally developed for UNIX and ported to many other operating
systems. With the popularity of graphical user interfaces, newsreaders were also ported to
these environments, resulting in software such as xrn (X Windows-based readnews). Most of
the readnews variants share a basic command set, although each adds features that may
appeal to some users.
Two types of software are involved in making a news service work on a Linux machine.
The transport software (usually C News for UUCP connections or NNTP for TCP connections)
gets the newsgroups to your machine. The newsreader then assembles and presents the
articles to the user. Newsreaders are only involved in the actual user interface; they
simply pass and receive news articles from the underlying software. For that reason, you
don't need to look at the mechanics of a newsreader to understand how Linux processes
news. The original news system relied completely on UUCP, so much of the news software was
designed for UUCP and then modified later to accommodate alternate methods.
To transfer news from one machine to another, a technique called flooding is used. One
machine calls another and transfers all the news articles. The machine that just received
the news calls another and transfers the articles again. The news articles flow across the
networks by moving from machine to machine instead of all the machines polling a single
main news source. Each machine maintains a list of other sites it can contact to transfer
mail. Each connection to another machine is called a newsfeed.
Each machine can generate new articles as the system's users interact with newsgroups.
When new articles are created, the machine checks its list of newsfeeds and calls them to
transfer the new mail. Because each article generated by a newsreader has a list of the
machines that it has passed through (called the Path), the local machine knows whether the
remote sites on its newsfeed list have already seen the article. As articles move from
machine to machine, each machine adds its own identifier to the article's Path field,
using the UUCP bang-style notation.
An entry in the Distribution field of the header may place a restriction on the
machines that can be sent an article. For example, if you write an article that you want
to stay within your local area network, you can specify this in the Distribution field of
the message when you write it. Then when a newsfeed to a machine outside the local area
network is created, the Distribution field prevents the article from being sent.
To help prevent duplicates of articles moving around USENET, each article has a unique
identifying number called a message ID (which sits in the Message-Id field in the article
header). The message ID is a combination of a unique number and the name of the machine
that the article was originally posted on. Machines use these message ID numbers when a
connection to a newsfeed is established. A history file on each system contains a list of
all message ID numbers that the local system has. When the two machines communicate with
each other, they can check the history file to find out whether the message should be
sent. This process is part of a news transfer protocol called ihave/sendme.
With the ihave/sendme protocol, one machine sends a list of all the message ID numbers
it currently has and waits for the other machine to identify the ones it wants. These
numbers are transferred one at a time in response to sendme messages. Then the process can
be reversed to update the other machine. This type of protocol works well, but it does
involve a lot of overhead in the communications process. For that reason (coupled with the
generally slow lines used by UUCP modem links), ihave/sendme protocols are not often used
when a very large newsgroup transfer has to take place at regular intervals. You wouldn't
want to use ihave/sendme to transfer 100M of articles every day, for example.
An alternative method used for large transfers is batching of articles. In this method,
one machine sends everything it has to another machine. The receiving machine then
performs a check of the newly arrived articles to see whether it already has them. By
looking at the message ID number, the machine can discard duplicates. This method tends to
be a faster for transferring, although it does have more processing overhead for the
receiving machine when the machine deals with the newly arrived batch of articles.
For network-based news access, there are three ways to get articles from another
machine. Using NNTP, your machine can download articles you want using a technique called
pushing the news, which is similar to the ihave/sendme protocol. Your machine can also
request specific newsgroups or articles from the remote based on the date of arrival,
which is called pulling the news. Alternatively, you can interact on an article-by-article
basis with the remote, never downloading the articles to your local machine. This process
is called interactive newsreading, and it works only when you have a
newsfeed you can log in to (which is common these days).
This chapter looked at the basics of USENET and news. This information provides a foundation for the next two chapters, which look at NNTP (for network-based access to news) and C News (for UUCP and network access to news). Entire books are written on the subject of USENET and the protocols it uses. Check out one of them if you need more information on this subject.